Imputation using external reference panels is a widely used approach forincreasing power in GWAS and meta-analysis. Existing HMM-based imputationapproaches require individual-level genotypes. Here, we develop a new methodfor Gaussian imputation from summary association statistics, a type of datathat is becoming widely available. In simulations using 1000 Genomes (1000G)data, this method recovers 84% (54%) of the effective sample size for common(>5%) and low-frequency (1-5%) variants (increasing to 87% (60%) when summaryLD information is available from target samples) versus 89% (67%) for HMM-basedimputation, which cannot be applied to summary statistics. Our approachaccounts for the limited sample size of the reference panel, a crucial step toeliminate false-positive associations, and is computationally very fast. As anempirical demonstration, we apply our method to 7 case-control phenotypes fromthe WTCCC data and a study of height in the British 1958 birth cohort (1958BC).Gaussian imputation from summary statistics recovers 95% (105%) of theeffective sample size (as quantified by the ratio of $\chi^2$ associationstatistics) compared to HMM-based imputation from individual-level genotypes atthe 227 (176) published SNPs in the WTCCC (1958BC height) data. In addition,for publicly available summary statistics from large meta-analyses of 4 lipidtraits, we publicly release imputed summary statistics at 1000G SNPs, whichcould not have been obtained using previously published methods, anddemonstrate their accuracy by masking subsets of the data. We show that 1000Gimputation using our approach increases the magnitude and statistical evidenceof enrichment at genic vs. non-genic loci for these traits, as compared to ananalysis without 1000G imputation. Thus, imputation of summary statistics willbe a valuable tool in future functional enrichment analyses.
展开▼